Under the framework of spectral clustering, the key of subspace clustering isbuilding a similarity graph which describes the neighborhood relations amongdata points. Some recent works build the graph using sparse, low-rank, and$\ell_2$-norm-based representation, and have achieved state-of-the-artperformance. However, these methods have suffered from the following twolimitations. First, the time complexities of these methods are at leastproportional to the cube of the data size, which make those methods inefficientfor solving large-scale problems. Second, they cannot cope with out-of-sampledata that are not used to construct the similarity graph. To cluster eachout-of-sample datum, the methods have to recalculate the similarity graph andthe cluster membership of the whole data set. In this paper, we propose aunified framework which makes representation-based subspace clusteringalgorithms feasible to cluster both out-of-sample and large-scale data. Underour framework, the large-scale problem is tackled by converting it asout-of-sample problem in the manner of "sampling, clustering, coding, andclassifying". Furthermore, we give an estimation for the error bounds bytreating each subspace as a point in a hyperspace. Extensive experimentalresults on various benchmark data sets show that our methods outperform severalrecently-proposed scalable methods in clustering large-scale data set.
展开▼